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Fig. 1. Block diagram of the Fl_MS algorithm. 
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Hermitians (complex conjugate transposes) are denoted bv r and H 
respectively. The symbol (.. •> denotes the inner product, and diaconaJ 
matrices are denoted with a tilde. 

The FOBA is different from the FLMS algorithm (5] in that it uses 
a time-varymg step size. The taps of a frequency -domain FIR filter 
are adjusted to minimize the mean of the magnitude squared error 
The A .point filter output y k is obtained usine the overlap-and-save 
technique [6J, i.e., 

y fc = T L X k T F A k (J) 

where A k is the 2.V-poim complex filter weight vector. X k is the 
DFT of the 2A -point vector containing the present and previous block 
of input data, and 

The 2X x 2.V DFT matrix is given by F y and O is an A" x V zero 
matrix. The constrained gradient is obtained by taking the inverse 
FFT. setting the last A" elements to zero, and taking the forward 
FrT, i.e.. 



g k = [first X terms of lFFT{ — 2X k E T }J 



G k = FFTfaf | o r ] 
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where 0 is the A'-point null vector. A compact form of the gradient 
is given by 



where 



G k = -7T F X k E k 



Ek =T L (D k -X k T r A k ) 



(5a) 



(5b) 



is the 2A r -point frequency-domain filter error vector and D k is the 
DFT of the 2 A-point vector containing the present and previous block 
of the desired signal. The coefficient update is based on the complex 
LMS algorithm [I2J, i.e., 

A* +1 =A k - (6) 

is the adaptive step size for the kth block. An optimal adaptive 
step size for the constrained FLMS was given in (8], [9] by 



o = X(g k , g k ) 
2(c kt c k ) 



(7) 
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where 



ct = [last .V terms of \FFV{X k G k }] 



c k = F- l T L XkG k . 



(8) 



(9) 



The adaptive step size is optimal in the sense that it minimizes an 
estimate of the block mean-square error from one iteration to the next. 
The derivation of the step size is given in [8). Equations (3M9) 
describe the FOBA. 

HI. Convergence and Computational Complexity 
In this section, we propose methods for improving the convergence 
speed and reducing the computational complexity of the FOB A. 
These involve replacing an FFT with the LMS spectrum analyzer 
[10J, using packing and pruning FFT algorithms, and adopting 
simplified matrix-vector products. The 2 .V -point vector x* t which 
contains the present and previous blocks of input data, is updated m 
samples at a time. It can be shown [9] that if m < -V. i.e.. blocks 
are overlapped, the convergence speed of the FOBA improves. Block 
overlapping implies data reusing, and when m = 1, the convergence 
speed of the FOBA improves considerably (9J. Data reusing occurs 
because some of the samples in consecutive blocks are the same, 
and hence they are reused to adapt the coefficients in consecutive 
iterations. The idea of data reusing is exploited further here by 
performing r iterations between each new input sample when m = 1. 
For large r, significant improvement in convergence rate is realized at 
the expense of increased computation. The merit of this data-reusing 
scheme is demonstrated via a system identification simulation with 
■V = 32 (see Fig. 2). In this simulation, the input x k is white Gaussian 
noise (zero mean and unit variance) which drives an Yth-order fixed 
filter F(-). The normalized error energy (NEE) is used to evaluate 
this simulation. This is given by 

Results are shown for (a) m = A\ r = 1, (b) m = 1, r = 1. (c) 
m - 1, r = 10, and (d) m = 1, r — 100. Fig. 2 shows that as r 
increases, the convergence speed improves. For large r, convergence 
lime was generally found to approach X samples. Note that the 
improvement observed here is in terms of convergence time, and 
not misadjustment. In fact, it is well known (1] that the convergence 
speed of LMS- type algorithms can be improved only at the expense 
of increased misadjustment. Hence, the result of Fig. 2 is consistent 
with the theory of convergence of LMS algorithms. 

Methods to reduce the computational complexity of the FOBA 
are now presented. Diagonal matrix-vector multiplications require 
only -jY + 1 products since the frequency-domain data are complex 
conjugate symmetric. When m = 1, the FFT of x k may be replaced 
by the adaptive "LMS spectrum analyzer" (fi = 0.5) described by 
Widrow et at. [10], This algorithm provides an "adaptive" running 
DFT of the input for each new sample. The algorithm is described 
below, i.e.. 



NEE = 



(10) 



Vk = [1 e 
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,V' mi (m > 0) 



= V k W k 



(11) 



(12) 



(13) 



where Vt is a vector of harmonic phasors, W k is a vector of complex 
adaptive weights, X k is the transform of the last 2.V samples of the 
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Fie. 1 Comparison of results from FOBA simulation. The curves corre- 
spond io the following: (a) tn — X, r — \. (b) in = 1. r = 1. (c) m = 1. 
r = 10. and (d) m = 1. r = 100. 



input x k . and j is These results can be simplified further by 

expressing the complex weight update equation (12) as 

W 4 =W,_ X - T t _,. V -, 

- \" t _2A _t + j^-iV;,,. k > 2N. (14) 



Since V k is periodic with period 
becomes 



-2.Y. V k = Vjfc-2.v and (14) 



W k - X = W k + {x k - x fc _2.v)VI. * > 2.W (15) 

Substituting (15) into (13) and simplifying yields 

X k = V x X k -\ + V x {r k - xt-j.v). (16) 

Thus, when m = 1. the FFT of x k may be replaced by the modified 
"steady flow DFT' (SFDFT) given by (16) which requires only 2A" 
complex multiplications and 2_Y-hl complex additions. If x k is real, 
only .Y-M complex multiplications and JV+2 complex additions are 
required. V\ may be recognized as a twiddle factor vector required 
by FFT algorithms. Therefore, this technique introduces no additional 
storage requirements. 

The bulk of the remaining computations required by the FOBA 
are contained in five other FFT's. The SFDFT cannot be applied for 
these transformations since all the time -domain samples are updated 
on a block-by-block basis. They may be simplified, however, by 
eliminating unnecessary computations. The overlap-and-save tech- 
nique requires that half the inputs to the FFTs used to compute E k 
and G k be zero. From results presented by Skinner [13], it may be 
shown that the number of computations required to perform these 
FFT's is reduced by 50% when the DIT FFT algorithm is "pruned" 
to avoid performing butterflies with zero inputs. The overlap-and- 
save technique also requires that half the output points from the 
FFT's used to compute y k , g k . and c k be ignored. Since butterfly 
diagrams of the D1F and DIT FFT algorithms are essentially mirror 
images of each other (see. Fig. 3), Skinner's results may be applied to 
prune the output of the DIF algorithm to avoid computing the unused 
output points [14]. If the samples x k are real valued, then all FFT's 
operate on either real or complex conjugate symmetric vectors. It has 
been shown [15] that an A'-point FFT may be used to compute the 
transform of a 2:V -point real sequence. O(.Y) computations are then 
required to "unpack" the 2. V -point complex result. The pruned FFT 
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Table I 

Com putations/S a m ple (m = A\ r = l) 



Complex FOBA 
Complex FOBA (PFFT's) 
Real FOBA 
Real FOBA (PFFT's) 



Real Multiplications* 1 Per Sample 



M LOG 2 N + 5-4 
14 LOG 2 N + 44 
12 LOCjN + 39 + 3S/X 
"LOG2.V + 29 + 28/.V 



Real Additions 0 Per Sample 



"Considering that one complex multiply = f our rca | multiplication and two real adds 



3C LOG 3 N + 54 
21 LOG 2 N + 39 
IS LOG 2 N + 5r + S5/;N 
10.5 LOC3.V+ 37 + 35/N 
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Fig. 3. Gradient constraint via packing and pruning techniques. 

(PFFT) algorithms can be used for the .V -point FFT required above 
for approximately 75% reduction in computation. As an example 
Fig. 3 illustrates how the packing and pruning technique is used to 
constrain the gradient when ;V = 4. 

. This is the windowing process shown in (3) and (4). The eight 
elements of the unconstrained frequency-domain gradient G u are 
"packed" (yielding <7' u ) into the input bins of the four-point FFT 
The inverse FFT is performed, and unnecessary computations are 
eliminated. Odd elements of the time-domain gradient vector appear 
as the complex pan. and even elements as the real part. The forward 
FFT is then performed yielding G\ which is the ^packed" version 
of the constrained frequency-domain gradient. Table J illustrates the 
computational complexity of the FOBA with ordinary and pruned 
FFT's (PFFT's). The complexity is described in terms of the additions 
and multiplications required per input sample. Both real or complex- 
valued input data are considered. As shown in Table I, the use of 
PFFT's reduces the complexity of the FOBA significantly. 

When m = 1, we have shown that the convergence speed may 
be stgmficantly ]mproved at the expense of increased computational 
complexity. This becomes evident when the entries of Table I are 
compared against those of Table 0. Notice that in both tables the 
complexity is expressed in terms of computations per input sample as 
opposed to input block. This allows direct comparison of the FOBA 
and the data-reusing FOBA. Table II shows the effect of the two 
convergence improvements discussed (i.e., block overlapping and r 
extra iterations per sample) on the computational requirements of the 
FOBA. Table II also illustrates the computational savings obtained 
when m = 1 and the SFDFT is used in place of an FFT 




sample 

Fig. 4. Effects of fixed-point arithmetic on FOBA adaptation. 



IV. Effects of Fixed-Point Arithmetic 
This section examines some of the problems associated with the 
mplementation of the FOBA on a fixed-point signal processor. The 
algorithm can become numerically unstable, even when finite 
recision floating-point schemes are used. Fixed-point arithmetic 
©triplicates this problem because it is more susceptible to roundoff 
verflow. and underflow error than floating-point arithmetic. The 
it of fixed-point machine numbers is more dense than the set of 
oating-pomt numbers (assuming the same word length) However 
.e maximum representable magnitude is generally much smaller and 



the minimum representable magnitude is generally much larger in 
fixed-point than in floating-point arithmetic. Although values within 
this range are represented more accurately in fixed-point arithmetic 
operations whose result has a different order of magnitude than 
its operands can result in overflow, underflow, or roundoff. Fig 4 
shows the effect of fixed-point arithmetic on the adaptation of the 
FOBA Tnese results are from a system identification simulation with 

Tnl* T " *• Md T = 1 " Sing l6 ~> 24 ~> 32 * bit fW-point 
and 32-bit floating-point arithmetic. For the 24-bit case, the Motorola 
DSP56000 Mixed Number format [I6J was used, i.e., an 8-bit inteeer 
and a 16-bit fraction. 

Several adjustments were made to prevent fixed-point overflow 
and underflow. The input sequences d k and x* were normalized to 
prevent overflow. The FOBA in this case was modified to prevent 
some of the adverse effects of fixed-point arithmetic. The normalized 
FFT was used to control the magnitude of vectors in the frequency 
domain. All the FFT's were replaced by normalized FFT's except 
for the one used to compute X k , The computation of Uk shown 
in (7) involves dividing one inner product by another The inner 
product in the denominator can be expressed as a matrix product with 
four terms, two of which depend on the filter error. When the filter 
error is small, underflow can cause this result to become zero When 
the filter error is large, the inner products may result in fixed-point 
overflow. To avoid overflow and underflow, each term of each inner 
product summation was scaled by max, c£(i), N <i < 2.V. If either 
inner product resulted in underflow. Mk was not updated. Clearly 
additional computational overhead (2N additional multiplies) and 
roundoff error was introduced in order to safely calculate /u For 
high-order fillers, this is not a significant increase. When processor 
speexj limitations require that lower order filters be used, a fixed u k 
may be necessary. 
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TABLE I! 
Computations/Sample (m = 1) 



Algorithm Modification 


Real Multiplications* Per Sample 


Real Additions 0 Per Sample 


Complex FOBA (r ~ 1) 


24 A' log 2 A r + 


3GA'Iog 2 A* + 54 A r 


Real FOBA (r = 1) 


12A'tog 2 A* + 30,V + 3S 


ISA" tog 2 A' -f 5.7 A' + *5 


Real FOBA (SFDFT) 


10rA'log 2 N + (35r + 0.5)A r + 34r + 1 


15r A r log 2 A' + (49r + 0.5) A* + 46r + 2 


Real FOBA (SFDFT. PFFT's) 


or.V log 2 A' + (35r + 0.5)A* + 34r + 1 


75r,V log 2 A* + (49r + 0.5) A' + 4Gr -h 2 



a Considering that one complex multiply = four real multiplications and two real adds. 
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Fig. 5. Adaptive noise canceller. 

V. The FOBA for Adaptive Noise Cancellation 
The FOBA is used here for adaptive noise cancellation (Fig. 5) 
in conjunction with the SFDFT (when m = 1). The weights are 
adjusted to minimize the difference between the primary input and 
the adaptive filter outpuL When the speech is uncorrected with the 
noise, this difference is a "best" least-squares estimate of the speech 
signaJ [I]. 

Experimental results were obtained using a white noise generator 
in an anechoic chamber. The microphones were one foot apart. The 
two inputs were sampled at 8 kHz and quantized to 11 b. Filtering 
was done off-line using both floating-point and fixed-point arithmetic. 
The phrase. "Good evening, I'm Ted Koppel and this is Nightline. 
TonighL . was spoken into the primary microphone. Fig. 6 shows 
the segmental energy (SE) of the noise canceller output and that of 
primary input. This is obtained from 

SE(x,*)= i ^ x 2 (n) (17) 

where i is the signal being evaluated. The original (noisy) signal 
is shown as segmental energy (a). The FOBA (A* = 12S, m = 1, 
r = 1> with floating-point arithmetic was used to adjust the filter 
weights resulting in segmental energy (d). The noise energy [compare 
(a) and (d)] is shown to be attenuated by approximately 25 dB. 
Segmental energies (b) and (c) were generated using the 24-bit 
DSP 56000 Mixed Number fixed-point format. For the data-reusing 
FOBA, background noise grew noticeably louder as time elapsed [see 
segmental energy (b)]. This illustrates the effect of roundoff error 
accumulation when the data-reusing schemes discussed earlier are 
implemented on a fixed-point signal processor. When disjoint blocks 
were used (m = 12S). segmental energy (c) was obtained, and this 
effect was not as evident. Approximately 10 dB noise reduction was 
achieved in the latter case. 

VI. Concluding Remarks 
In this paper, we studied implementation issues associated with 
the fixed-point realization of a frequency-domain adaptive algorithm. 
In particular, we suggested methods for improving the convergence 
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Fig. 6. Comparison of segmental signal energies. The curves correspond to 
the following: (a) segmental energy for noisy speech, (b) segmental energy of 
output of a fixed-point ANC with overlapping blocks (;V = 12$, m = 1. 
: 1 ), (c) segmental energy of the output of a fixed-point ANC with disjoint 
blocks (.V = 12$. m = 12$. r = 1), and (d) segmental energy of the output 
of a floating-point ANC with overlapping blocks (.V = 12$. m = 1. r = 1). 



speed and reducing the computational complexity of the Frequency- 
Domain Optimum Block Algorithm (FOBA). Improvements in the 
convergence speed (in terms of data samples) were realized by per- 
forming r weight updates using the same block of data. Simulations 
using floating-point arithmetic, white noise inputs, and large r have 
generally shown that the number of samples required for the FOBA 
to identify an FIR filter of the same order was roughly equal to the 
length of its impulse response. The computational complexity of the 
FOBA was reduced by using the SFDFT and packing and pruning 
FFT's. 

We also examined the proposed methods in a more realistic sce- 
nario by implementing a frequency-domain adaptive noise canceller 
that uses the FOBA. Results were given for fixed- and floating- 
point arithmetic. Although convergence improvements, using the 
aforementioned methods, were evident using floating-point arith- 
metic, fixed-point implementation introduced fixed-point roundoff 
error accumulation which had an adverse effect on the performance 
of the noise canceller. Experiments also revealed that the computation 
of /it is sensitive to fixed-point implementation. 
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Abstract— An ex»ct formub for the output ooise spectrum of a double 
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overroa^veVTe",' r -"odulators. Wh ich do not 

overload over the entire input amplitude range. 
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•he no.se spectrum from U>« correlation function. The present ™Zd 
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Introduction 

The quantization noi« spectrum is an important characeristic 
of analog-to-d.g.tal (A/D) and digiuMo-ana.og (D/A) convent 
Howev for 0 |ed conveners such ^ J£ 

lato s the exact determination of this spectrum is made difficult 
by .he presence of a nonlinear quan ,i, er in lne fcedback ™ 
To overcome this d.fficulty. mos. previous analvses are based Ton 
rep.ac.ng the qU an,, Z er u ilh an additive whi.e no.se source which"" 

anal vzed h J'T ^ for ' hC S y S,em - whi ^ can be easi.y 

analyzed by standard l.near system techn.oues (I). ,2] Th.s pape^ 
presents w.thou, the use of the white-noise mode, and u er £ 
no overloadmg assumption, an e.ac, formula for the quanuzauon 
no.se spectrum for 3 doub.e-loop sigma-de.ta coder with a s.nuSca" 

The previous exact analysis of the double-loop output quantizat.on 

don/ i s „r: dc r ,ona ' inpu,s - ^^o'j 0 ^z 

done m [8). where ,t ,s demonstrated that the output quamiza.™ noise 
« as predated by the w h i,e-noise model. A simHar exac nls.s for 
he ln uso,da. .nput case for two-stage sigma^e.ta coders 
«n (II) where „ was shown ,ha, the quantization noise spectrum 

do bTlooo c^" SPeCtra ' UneS - TW °- Sla?e 3nd 3 "—11: 
double-loop coder were shown ,0 be mathematically equivalent ,n 

hi 'H bqUan,,2er i .' S SUfficie "' t0 "O overload,n e over 

.he en, « mpu. ampl.tude range [7J. (8 J. Practical implemenr a uons 
of mu t ,b« stgrna^eh, modulators have received s.enificam anention 
recently because a higher SNR (signal-to-noise ratio) as we,, as twer 
out-of-band noise can be obtained fl2)-(l5J 

The previous exact analyses are based on applvin- er°odic theorv 
.c • ex P , lcl „y d e, ermine the asymp|otjc au^o^at^nVf CaS 

rZZ°J ' hCn 10 find * he ^—-»on no,se spectrum 

from the autoconelauon function. A difficulty with this method Ts 

ol of U ,he"i° nS mU$t ^ madC W '" th r " Pee ' 10 ,he — ionali^o 
some of the .npu, parameters. However, i, is shown in (3]-[S]\ ha , 
rauona. .nputs may give rise .0 limit cyc.es of the coder J 

cZseoZ! 'T '° • PMk " ear S ° mC ° f ,h « e ^ rattona, val!T 
Consequently the ranonaJ inputs may be relevant ,0 understanding 
cena.n quant.zat.on noise phenomena. This is particu.ar.y the S 
for D/A conveners where the input is rational 

exirVS? mT We prescnc •••emative. direct derivation of the 
exact double-loop s.gma^Jelta quantiza«ion notse with sinusoidal 
mpu. ass Um .ng no overloading. The method used here is ,0 app.y 
a Founer senes representation of the quantization error function 

Imnf? a k , 3 S,nusoida ' in P"« with irrational frequency and 
ampl.tude the result agrees with the ergodic theory results Ho Jew 
an exact formu.a for the output noise spectrum 7s also proved for 
.nputs w,U. rauonal frequency and amplitude. Furthermore, me r* ric J 
of the output ,s explicitly calculated for a coder with rational^ 
conditions and rational dc input. 

In section II. we present the architecture of the siema-delta 

to the one used m the ergodic theory analyses, is developed for 
the s,gma-de.ta coder. In section IV. the noise spectnlm formuU 
•s denved based on replacing the quantizer in the o^en-.oopZde ' 
by .ts Founer series represenution. In section V. this noise fcJwS 
compared to ti,e formula predicted by the white noise « TtheTgooJc 
theory result. Simulation results, which confirm the analysis "! a£ 
presented in this section. 
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